基于注意力的神经网络在许多AI任务中都普遍存在。尽管其出色的算法性能,但注意力机制和前馈网络(FFN)的使用仍需要过多的计算和内存资源,这通常会损害其硬件性能。尽管已经引入了各种稀疏变体,但大多数方法仅着重于缓解算法级别上的二次注意力缩放,而无需明确考虑将其方法映射到真实硬件设计上的效率。此外,大多数努力仅专注于注意机制或FFN,但没有共同优化这两个部分,导致当前的大多数设计在处理不同的输入长度时缺乏可扩展性。本文从硬件角度系统地考虑了不同变体中的稀疏模式。在算法级别上,我们提出了Fabnet,这是一种适合硬件的变体,它采用统一的蝴蝶稀疏模式来近似关注机制和FFN。在硬件级别上,提出了一种新颖的适应性蝴蝶加速器,可以在运行时通过专用硬件控件配置,以使用单个统一的硬件引擎加速不同的蝴蝶层。在远程 - ARENA数据集上,FabNet达到了与香草变压器相同的精度,同时将计算量减少10到66次,参数数量为2至22次。通过共同优化算法和硬件,我们的基于FPGA的蝴蝶加速器在归一化到同一计算预算的最新加速器上达到了14.2至23.2倍的速度。与Raspberry Pi 4和Jetson Nano上优化的CPU和GPU设计相比,我们的系统在相同的功率预算下的最大273.8和15.1倍。
translated by 谷歌翻译
使用本机LUT作为独立培训推理运营商的FPGA特定的DNN架构已被证明实现了有利的区域准确性和能量准确性权衡。该领域的第一个工作Lutnet,对标准DNN基准测试表现出最先进的性能。在本文中,我们提出了学习的基于LUT的拓扑结构的优化,从而导致更高效率的设计,而不是通过直接使用现成的手工设计的网络。本类架构的现有实现需要手动规范的每拉特的输入数,K。选择合适的k先验是具有挑战性的,并且在甚至高粒度下这样做,例如,如此。每个层,是一种耗时和错误的过程,可以留下FPGA的空间灵活性欠缺。此外,先验工作请参阅随机连接的LUT输入,不保证网络拓扑的良好选择。为了解决这些问题,我们提出了逻辑收缩,一种细粒度的网格剪枝方法,使K将自动学习,用于针对FPGA推理的神经网络中的每一个LUT。通过删除确定为低于重要性的LUT输入,我们的方法会增加所得加速器的效率。我们的GPU友好的LUT输入拆卸解决方案能够在培训期间加工大型拓扑,可忽略不计的放缓。通过逻辑收缩,我们可以分别更好地完成CNV网络的最佳Lutnet实现的区域和能源效率,分别将CIFAR-10分别达到1.54倍和1.31倍,同时匹配其精度。该实现也达到2.71倍的区域效率同样准确,严重修剪的BNN。在具有双重净架构的Imagenet上,逻辑收缩的就业导致综合后面积减少2.67倍VS Lutnet,允许以前在今天最大的FPGA上实现的实施。
translated by 谷歌翻译
The field of autonomous mobile robots has undergone dramatic advancements over the past decades. Despite achieving important milestones, several challenges are yet to be addressed. Aggregating the achievements of the robotic community as survey papers is vital to keep the track of current state-of-the-art and the challenges that must be tackled in the future. This paper tries to provide a comprehensive review of autonomous mobile robots covering topics such as sensor types, mobile robot platforms, simulation tools, path planning and following, sensor fusion methods, obstacle avoidance, and SLAM. The urge to present a survey paper is twofold. First, autonomous navigation field evolves fast so writing survey papers regularly is crucial to keep the research community well-aware of the current status of this field. Second, deep learning methods have revolutionized many fields including autonomous navigation. Therefore, it is necessary to give an appropriate treatment of the role of deep learning in autonomous navigation as well which is covered in this paper. Future works and research gaps will also be discussed.
translated by 谷歌翻译
We introduce a machine-learning (ML)-based weather simulator--called "GraphCast"--which outperforms the most accurate deterministic operational medium-range weather forecasting system in the world, as well as all previous ML baselines. GraphCast is an autoregressive model, based on graph neural networks and a novel high-resolution multi-scale mesh representation, which we trained on historical weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF)'s ERA5 reanalysis archive. It can make 10-day forecasts, at 6-hour time intervals, of five surface variables and six atmospheric variables, each at 37 vertical pressure levels, on a 0.25-degree latitude-longitude grid, which corresponds to roughly 25 x 25 kilometer resolution at the equator. Our results show GraphCast is more accurate than ECMWF's deterministic operational forecasting system, HRES, on 90.0% of the 2760 variable and lead time combinations we evaluated. GraphCast also outperforms the most accurate previous ML-based weather forecasting model on 99.2% of the 252 targets it reported. GraphCast can generate a 10-day forecast (35 gigabytes of data) in under 60 seconds on Cloud TPU v4 hardware. Unlike traditional forecasting methods, ML-based forecasting scales well with data: by training on bigger, higher quality, and more recent data, the skill of the forecasts can improve. Together these results represent a key step forward in complementing and improving weather modeling with ML, open new opportunities for fast, accurate forecasting, and help realize the promise of ML-based simulation in the physical sciences.
translated by 谷歌翻译
Low Earth Orbit (LEO) constellations, each comprising a large number of satellites, have become a new source of big data "from the sky". Downloading such data to a ground station (GS) for big data analytics demands very high bandwidth and involves large propagation delays. Federated Learning (FL) offers a promising solution because it allows data to stay in-situ (never leaving satellites) and it only needs to transmit machine learning model parameters (trained on the satellites' data). However, the conventional, synchronous FL process can take several days to train a single FL model in the context of satellite communication (Satcom), due to a bottleneck caused by straggler satellites. In this paper, we propose an asynchronous FL framework for LEO constellations called AsyncFLEO to improve FL efficiency in Satcom. Not only does AsynFLEO address the bottleneck (idle waiting) in synchronous FL, but it also solves the issue of model staleness caused by straggler satellites. AsyncFLEO utilizes high-altitude platforms (HAPs) positioned "in the sky" as parameter servers, and consists of three technical components: (1) a ring-of-stars communication topology, (2) a model propagation algorithm, and (3) a model aggregation algorithm with satellite grouping and staleness discounting. Our extensive evaluation with both IID and non-IID data shows that AsyncFLEO outperforms the state of the art by a large margin, cutting down convergence delay by 22 times and increasing accuracy by 40%.
translated by 谷歌翻译
A Complete Computer vision system can be divided into two main categories: detection and classification. The Lane detection algorithm is a part of the computer vision detection category and has been applied in autonomous driving and smart vehicle systems. The lane detection system is responsible for lane marking in a complex road environment. At the same time, lane detection plays a crucial role in the warning system for a car when departs the lane. The implemented lane detection algorithm is mainly divided into two steps: edge detection and line detection. In this paper, we will compare the state-of-the-art implementation performance obtained with both FPGA and GPU to evaluate the trade-off for latency, power consumption, and utilization. Our comparison emphasises the advantages and disadvantages of the two systems.
translated by 谷歌翻译
Automatic segmentation is essential for the brain tumor diagnosis, disease prognosis, and follow-up therapy of patients with gliomas. Still, accurate detection of gliomas and their sub-regions in multimodal MRI is very challenging due to the variety of scanners and imaging protocols. Over the last years, the BraTS Challenge has provided a large number of multi-institutional MRI scans as a benchmark for glioma segmentation algorithms. This paper describes our contribution to the BraTS 2022 Continuous Evaluation challenge. We propose a new ensemble of multiple deep learning frameworks namely, DeepSeg, nnU-Net, and DeepSCAN for automatic glioma boundaries detection in pre-operative MRI. It is worth noting that our ensemble models took first place in the final evaluation on the BraTS testing dataset with Dice scores of 0.9294, 0.8788, and 0.8803, and Hausdorf distance of 5.23, 13.54, and 12.05, for the whole tumor, tumor core, and enhancing tumor, respectively. Furthermore, the proposed ensemble method ranked first in the final ranking on another unseen test dataset, namely Sub-Saharan Africa dataset, achieving mean Dice scores of 0.9737, 0.9593, and 0.9022, and HD95 of 2.66, 1.72, 3.32 for the whole tumor, tumor core, and enhancing tumor, respectively. The docker image for the winning submission is publicly available at (https://hub.docker.com/r/razeineldin/camed22).
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
With an increasing amount of data in the art world, discovering artists and artworks suitable to collectors' tastes becomes a challenge. It is no longer enough to use visual information, as contextual information about the artist has become just as important in contemporary art. In this work, we present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. In this approach, we first continue to pre-train the existing general English language models with a large amount of unlabelled art-related data. We then fine-tune this new pre-trained model with our biography pair dataset manually annotated by a team of professionals in the art industry. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score and outperforms other baseline models. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
translated by 谷歌翻译
It can be easy and even fun to sketch humans in different poses. In contrast, creating those same poses on a 3D graphics "mannequin" is comparatively tedious. Yet 3D body poses are necessary for various downstream applications. We seek to preserve the convenience of 2D sketching while giving users of different skill levels the flexibility to accurately and more quickly pose\slash refine a 3D mannequin. At the core of the interactive system, we propose a machine-learning model for inferring the 3D pose of a CG mannequin from sketches of humans drawn in a cylinder-person style. Training such a model is challenging because of artist variability, a lack of sketch training data with corresponding ground truth 3D poses, and the high dimensionality of human pose-space. Our unique approach to synthesizing vector graphics training data underpins our integrated ML-and-kinematics system. We validate the system by tightly coupling it with a user interface, and by performing a user study, in addition to quantitative comparisons.
translated by 谷歌翻译